Add Compressor interface by lucagiac81 · Pull Request #7650 · facebook/rocksdb

lucagiac81 · 2020-11-09T20:06:51Z

This PR refactors compression by introducing a Compressor interface.
This is a first step towards supporting compressors as plugins. PR #6717 covers the next step.

Compressor class

The Compressor class defines an interface for each compression algorithm to implement. It is a Customizable class, like other extensible components in RocksDB.

A Compressor has

A unique name
Compress/Uncompress methods
CreateDictionary method (for algorithms supporting dictionaries): the logic to build/train a dictionary is moved here, so future compressors have the option to customize it if necessary
Methods to handle processed/digested dictionaries (implemented by zstd, for example)
Options: each Compressor can define the options it supports using the Configurable framework

Streaming compression is not included in the Compressor class yet. The plan is to cover that in a separate PR.

Built-in compressors

The existing compression algorithms (referred to as "built-in") are encapsulated in Compressor classes. The classes inherit from BuiltinCompressor, which includes functionality shared by all built-in compressors.
Built-in compressors can be referenced by their numeric id (as defined by the CompressionType enum) to ensure backward compatibility. BuiltinCompressor uses the existing CompressionOptions struct as its configurable options.

Compressor lifecycle

For this PR, compression options exposed in the public API are unchanged (exposing Compressor in the public API and options is the scope of PR #6717).

The CompressionType and CompressionOptions passed through ColumnFamilyOptions and DBOptions are used to instantiate suitable Compressor instances (this is done in MutableCFOptions and ImmutableDBOptions).
The Compressor class keeps track of the instances currently in use, so they can be retrieved and reused.
Such Compressor instances are then used in other parts of the code in place of CompressionType and CompressionOptions (e.g., in BlockBasedTableBuilder, BlockBasedTable, FlushJob, Compaction, BlobFileBuilder...).
The details of the Compressor used for a block-based table are serialized in the Properties block.
When opening the SST file, the info in the Properties block is used to instantiate/retrieve a suitable Compressor for the table. If the compression id for a block doesn't match the table-level Compressor, a suitable Compressor is obtained when reading the block.

mrambacher

Generally I like where this is heading. I have some questions on CompressionInfo (whether it is turned upside down enough) and on the CompressorRegistry but I think this is a very good thing to have.

include/rocksdb/compressor.h

include/rocksdb/memory_allocator.h

util/compression.cc

include/rocksdb/compressor.h

lucagiac81 · 2020-11-17T18:42:32Z

Addressed some of the comments above and rebased branch. There are two additional changes from the previous version.

compressor.h: added DictCompressionSupported method in Compressor (needed after PR Expand effect of dictionary settings in ColumnFamilyOptions::compression_opts #7619).
options_helper.cc: GetSupportedCompressions and GetSupportedDictCompressions iterate over CompressorRegistry types, rather than using compression_type_string_map.

pdillinger · 2022-03-31T20:18:28Z

Keep in mind that for us to accept this, the performance penalty would have to be minimal.

lucagiac81 · 2022-03-31T22:00:49Z

Absolutely. Performance is a key requirement for this feature. From the data I've collected so far with db_bench (readrandom and readrandomwriterandom workloads), the performance difference is in general <1% (throughput, p99 latency). I'm collecting more data covering different conditions and additional metrics, so we can have a more accurate picture.
Are there specific conditions you'd recommend for performance testing, or do you have a set of benchmarks you already use to validate the effect of any code changes?

include/rocksdb/compressor.h

mrambacher · 2022-04-02T04:03:15Z

db/column_family.cc

Is there any possibility that this compressor is null?

compressor_per_level is initialized as an empty vector.

MutableCFOptions::RefreshDerivedOptions populates compressor_per_level using the content of compression_per_level.

Compressors are obtained from BuiltinCompressor::GetCompressor, which could return nullptr in case of an invalid selection in compression_per_level.

I thought an invalid selection in compression_per_level would be caught before reaching this method, but that's incorrect. I added a null check and a test in compression_test (to verify the expected error is returned in case of invalid selection in compression_per_level).

db/db_block_cache_test.cc

mrambacher · 2022-04-02T04:22:01Z

db/db_test.cc

I wonder if output_compression should be renamed to compressor or output_compressor?

Does it make sense to have a compaction->GetCompressionType shortcut method?

Renamed output_compression to output_compressor.
I did not add the shortcut method for compression type. I see both GetCompressionType and GetId called on the compressor returned by output_compressor in different places.

mrambacher · 2022-04-02T04:26:17Z

db/db_test2.cc

Is this meant to be the sizeof the Uncompression Dictionary itself or the struct?

In a previous version of the code, I added a Compressor* member to UncompressionDict, and these tests started failing due to the size being slightly larger than 2 blocks. With the current code, these changes are actually not needed, and they were removed.

db/flush_job.cc

pdillinger · 2022-04-04T16:47:12Z

fillseq and readrandom with -readonly should suffice, with low block cache hit rate on data blocks. You can do readrandom on the same DB before-and-after to eliminate LSM structure variances. But I would consider 1% overall ops/sec as a significant regression for Meta to absorb for a feature that Meta doesn't need.

lucagiac81 · 2022-04-18T00:12:25Z

I ran fillseq and readrandom benchmarks, comparing the PR against the previous commit on the main branch.
Workload parameters: 16-byte keys, 256-byte values, 4kB blocks, ZSTD/LZ4 compression, db on /dev/shm, no block cache.

The relative ops/s differences (PR vs main, mean of 5 runs) are listed below, although they are not statistically significant:

fillseq: +0.07% (ZSTD), -0.1% (LZ4)
readrandom: -0.18% (ZSTD), +0.51% (LZ4)

The instruction path length is also verified to be matched within 0.2%.
As we make changes as part of the review, we can verify performance stays matched with the latest code.

facebook-github-bot · 2022-05-17T11:31:24Z

@mrambacher has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

facebook-github-bot · 2022-05-24T03:49:10Z

@lucagiac81 has updated the pull request. You must reimport the pull request before landing.

facebook-github-bot · 2022-05-31T19:40:54Z

@lucagiac81 has updated the pull request. You must reimport the pull request before landing.

facebook-github-bot · 2022-06-03T18:19:03Z

@lucagiac81 has updated the pull request. You must reimport the pull request before landing.

facebook-github-bot · 2022-06-27T01:27:44Z

@lucagiac81 has updated the pull request. You must reimport the pull request before landing.

mrambacher · 2022-06-28T20:20:51Z

util/compressor.h

Thinking out loud, I wonder if the names of the Compress and Uncompress functions should be different. I can see there eventually being (at least) three signatures:

This method that takes a block (Slice) and returns a block (output)

An aysnchronous method that does something similar, invoking some sort of callback when the Compression is complete

A method for the compression of streams

With this in mind, I wonder if these methods should be named "CompressBlock" (or something similar) instead of just Compress

mrambacher · 2022-06-28T20:31:24Z

util/compressor.cc

assert(compression_dict)

and maybe assert that this Compressor supports a Dictionary? Or return an error if it does not?

assert added for compression_dict (in BlockBasedTableBuilder::EnterUnbuffered, where CreateDictionary is called). Added check for DictCompressionSupported in CreateDictionary. Also added comment for IsDictionaryEnabled to return false if DictCompressionSupported is false.

mrambacher · 2022-06-28T20:34:55Z

util/compressor.cc

Since this is referring to ZSTD methods, this seems like it should be done in a derived (not the base) class, even if multiple compressors might use the same dictionary trainer/finalize methods.

Can we add a virtual method to build the dictionary from the samples?

PR #6717 already introduced a separate SampleAndTrainDictionary method to make it easier for plugins to override only parts of CreateDictionary (and use the default sampling/training methods inside a customized implementation of CreateDictionary). I now split sampling and training into two virtual methods (SampleDict and TrainDict). TrainDict's default implementation still uses ZSTD_TrainDictionary and ZSTD_FinalizeDictionary, so plugins are able to use those as default.
I also renamed CreateDictionary as CreateDict and IsDictionaryEnabled as IsDictEnabled for consistency with other methods.
Some more tweaks may be needed to finalize the interface.

mrambacher · 2022-06-28T20:37:27Z

util/compressor.cc

I am not in favor of this mechanism. Is there a reason not to just make these virtual and have the derived class do the right thing?

Also, are dictionaries not supported in LITE mode?

I agree. All compressors so far override these methods. The methods are now changed to simply return a default value. This also eliminates the different behavior for the LITE version, which doesn't support GetOption.

mrambacher · 2022-06-28T20:39:21Z

util/compressor.h

There has been an effort to not return raw pointers as much in the code. Can you tell me if these can easily be made into std::unique_ptr ? That would satisfy the "not raw" and can help prevent leaks.

Simple change. CreateDictionary was already placing the raw pointer in a std::unique_ptr. We cannot do the same for UncompressionDict without further changes, as the Create method of BlocklikeTraitsreturns a raw pointer and that's aligned with other specializations.

mrambacher · 2022-06-28T20:49:52Z

tools/db_bench_tool.cc

This might be cleaner as:
CompressionInfo info;
info.sample_for_compression = FLAGS_sample_for_compression
(and eliminate the empty dictionary and format version)

Modified CompressionDict constructor to take sample_for_compression as parameter with default value. This made the code in db_bench_tool.cc more concise.

mrambacher · 2022-06-28T20:50:26Z

tools/db_bench_tool.cc

See above on cleanup

Same comment as above

mrambacher · 2022-06-28T20:51:04Z

tools/db_bench_tool.cc

See above on cleanup

Same comment as above

mrambacher · 2022-06-28T20:54:16Z

utilities/blob_db/blob_db_impl.cc

Is this just UncompressionInfo info; (like CompressionInfo above?

Yes it is. Updated

mrambacher · 2022-06-28T21:11:48Z

test_util/testutil.cc

Not that this is your bug, but 6 seems wrong here. The builtin types go from 0x0 to 0x7 (so 8 of them)

facebook-github-bot · 2022-07-26T17:50:06Z

@lucagiac81 has updated the pull request. You must reimport the pull request before landing.

facebook-github-bot · 2023-01-09T18:45:25Z

@lucagiac81 has updated the pull request. You must reimport the pull request before landing.

facebook-github-bot · 2023-02-09T18:57:13Z

@lucagiac81 has updated the pull request. You must reimport the pull request before landing.

facebook-github-bot · 2023-04-27T14:33:11Z

@lucagiac81 has updated the pull request. You must reimport the pull request before landing.

facebook-github-bot · 2023-05-01T18:03:25Z

@lucagiac81 has updated the pull request. You must reimport the pull request before landing.

facebook-github-bot · 2023-10-05T20:12:42Z

@lucagiac81 has updated the pull request. You must reimport the pull request before landing.

facebook-github-bot · 2023-10-06T00:22:29Z

@lucagiac81 has updated the pull request. You must reimport the pull request before landing.

facebook-github-bot · 2024-01-16T20:02:52Z

@lucagiac81 has updated the pull request. You must reimport the pull request before landing.

facebook-github-bot · 2024-01-16T22:10:22Z

@lucagiac81 has updated the pull request. You must reimport the pull request before landing.

facebook-github-bot · 2024-01-16T23:04:19Z

@lucagiac81 has updated the pull request. You must reimport the pull request before landing.

pdillinger · 2024-01-26T01:04:32Z

table/block_based/block_based_table_reader.cc

-          rep_->table_properties->compression_name !=
-          CompressionTypeToString(kNoCompression);
+      ConfigOptions config_options;
+      s = Compressor::CreateFromString(config_options,


I don't like that this implementation requires dependency injection (ObjectRegistry) (also known as "spooky action at a distance") to use a custom compressor. Reliable dependency injection can be a pain point for some people integrating RocksDB, e.g. through a shared object file.

Using prefix_extractor as an example, when the SST file uses the same prefix extractor as what is currently configured (for new SST files), dependency injection is not required to use it for reads.

Compressor::CreateFromString will reuse an existing instance if an equivalent one is already available. The existing instances are tracked by the Compressor class rather than passed to the table like prefix_extractor.

To follow a similar approach as for prefix_extractor, it should be possible to pass a custom compressor instance to RocksDB via options, without adding a factory function to ObjectRegistry. Would this address the concern with dependency injection? I will update PR #6717 (which adds support for custom compressors) to include and test this scenario.

pdillinger · 2024-01-26T01:11:08Z

table/block_based/block_based_table_reader.h

  }
+
+  std::shared_ptr<Compressor> GetCompressor(CompressionType compression_type) {
+    if (compression_type == compressor->GetCompressionType()) {


Important to note that only one custom compression type can be used per file. Allowing various custom compressors in a single file would be a substantial schema+code change. Yes someone could add their own layer within the custom compressor, but that would be extra overhead.

We don't currently foresee needing various compressions within a file (except for some blocks not compressed) but we want to be sure everyone is on board with this decision.

Yes, that is a limitation for plugin compressors. For built-in compressors, it's still possible to have different compressors per file. Their CompressionType id is still stored in each block as before. Plugins, on the other hand, all have the same id and rely on the compressor name in the properties block.

If various custom compressors per file are needed, we may be able to handle it by assigning "dynamic" numeric ids to custom compressors and recording them in blocks as usual. I can look into the implementation details if there is interest.

we may be able to handle it by assigning "dynamic" numeric ids to custom compressors and recording them in blocks as usual

Right. Might be worth keeping this in mind as a possible future feature, especially for schema purposes. I think the ideal state right now would be to reserve what that would look like in the schema, detect on read if that future feature is in use and provide a good error message.

pdillinger · 2024-01-26T01:18:08Z

table/block_based/block_based_table_reader_sync_and_async.h

-        s = UncompressSerializedBlock(
-            info, req.result.data() + req_offset, handle.size(), &contents,
-            footer.format_version(), rep_->ioptions, memory_allocator);
+        std::shared_ptr<Compressor> compressor =


Creating temporary copies of shared_ptrs on the read "hot path" is a performance hazard.

In the past we have had to re-engineer some new customizable integrations because of excessive shared_ptr ops.

The copy of shared_ptr is needed here in case the block is compressed by a different compressor than the one specified for the table (in rep_). I'll try to remove the copy at least for the most common case (block compressed with the table compressor).

Yeah, I think that's an over-simplified interface. There should be immortal (de-)compressors for built-in algorithms. And the needed custom compressors should be known and kept alive by the table reader (shared_ptr). (Are the compressors thread-safe by the way?)

While it might be nice to have a "GetCompressor" that copies a shared_ptr for arbitrary lifetime extension, the one used most would be to get a (de-)compressor reference or pointer that we promise not to use beyond the life of the table reader (which is guaranteed by the table cache to survive pending read operations).

pdillinger · 2024-01-26T01:21:09Z

table/block_based/block_based_table_reader.cc

    const PersistentCacheOptions& cache_options,
-    MemoryAllocator* memory_allocator, bool for_compaction, bool async_read) {
+    MemoryAllocator* memory_allocator, bool for_compaction,
+    const std::shared_ptr<Compressor>& compressor, bool async_read) {


I consider const shared_ptr<X>& parameters to be an anti-pattern. It prevents move semantics where it might be useful, and is non-committal about intention to save a reference. I suggest deciding between const X& and plain shared_ptr<X>.

I see the "mixed message" of passing const shared_ptr&. I'll review all occurrences and verify if they're justified (e.g., if a copy of the shared_ptr is eventually made).

pdillinger · 2024-02-05T23:25:55Z

util/compressor.h

+  UncompressionDict& operator=(const CompressionDict&) = delete;
+};
+
+// Interface for each compression algorithm to implement.


What are the thread safety expectations?

pdillinger · 2024-02-05T23:48:00Z

By the way, we are finally very interested in this line of features. Thanks for working on it!

lucagiac81 · 2024-02-08T01:32:55Z

By the way, we are finally very interested in this line of features. Thanks for working on it!

That's great, thank you! I'll work on the items above and let's continue the review. I'll also rebase, as I see some new conflicts.

lucagiac81 · 2024-10-29T22:07:07Z

Hi. We are planning to resume the pluggable compression work covered by this PR and PR #6717. We'd like to confirm there is still interest in driving this forward. If so, we will rebase on the latest version, address the feedback above, and coordinate to review when ready. Please let us know. Looking forward to completing this feature and making it available in RocksDB!

facebook-github-bot · 2025-01-08T00:58:14Z

@lucagiac81 has updated the pull request. You must reimport the pull request before landing.

Summary: Adds new classes etc. in internal compression.h that are intended to become public APIs for supporting custom/pluggable compression. Some steps remain to allow for pluggable compression and to remove a lot of legacy code (e.g. now called `OLD_CompressData` and `OLD_UncompressData`), but this change refactors the key integration points of SST building and reading and compressed secondary cache over to the new APIs. Compared with the proposed #7650, this fixes a number of issues including * Making a clean divide between public and internal APIs (currently just indicated with comments) * Enough generality that built-in compressions generally fit into the framework rather than needing special treatment * Avoid exposing obnoxious idioms like `compress_format_version` to the user. * Enough generality that a compressor mixing algorithms/strategies from other compressors is pretty well supported without an extra schema layer * Explicit thread-safety contracts (carefully considered) * Contract details around schema compatibility and extension with code changes (more detail in next PR) * Customizable "working areas" (e.g. for ZSTD "context") * Decompression into an arbitrary memory location (rather than involving the decompressor in memory allocation; should facilitate reducing number of objects in block cache) Pull Request resolved: #13540 Test Plan: This is currently an internal refactor. More testing will come when the new API is migrated to the public API. A test in db_block_cache_test is updated to meaningfully cover a case (cache warming compression dictionary block) that was previously only covered in the crash test. SST write performance test, like #13583. Compile with CLANG, run before & after simultaneously: ``` SUFFIX=`tty | sed 's|/|_|g'`; for ARGS in "-compression_parallel_threads=1 -compression_type=none" "-compression_parallel_threads=1 -compression_type=snappy" "-compression_parallel_threads=1 -compression_type=zstd" "-compression_parallel_threads=1 -compression_type=zstd -verify_compression=1" "-compression_parallel_threads=1 -compression_type=zstd -compression_max_dict_bytes=8180" "-compression_parallel_threads=4 -compression_type=snappy"; do echo $ARGS; (for I in `seq 1 20`; do ./db_bench -db=/dev/shm/dbbench$SUFFIX --benchmarks=fillseq -num=10000000 -compaction_style=2 -fifo_compaction_max_table_files_size_mb=1000 -fifo_compaction_allow_compaction=0 -disable_wal -write_buffer_size=12000000 $ARGS 2>&1 | grep micros/op; done) | awk '{n++; sum += $5;} END { print int(sum / n); }'; done ``` Before (this PR and with #13583 reverted): -compression_parallel_threads=1 -compression_type=none 1908372 -compression_parallel_threads=1 -compression_type=snappy 1926093 -compression_parallel_threads=1 -compression_type=zstd 1208259 -compression_parallel_threads=1 -compression_type=zstd -verify_compression=1 997583 -compression_parallel_threads=1 -compression_type=zstd -compression_max_dict_bytes=8180 934246 -compression_parallel_threads=4 -compression_type=snappy 1644849 After: -compression_parallel_threads=1 -compression_type=none 1956054 (+2.5%) -compression_parallel_threads=1 -compression_type=snappy 1911433 (-0.8%) -compression_parallel_threads=1 -compression_type=zstd 1205668 (-0.3%) -compression_parallel_threads=1 -compression_type=zstd -verify_compression=1 999263 (+0.2%) -compression_parallel_threads=1 -compression_type=zstd -compression_max_dict_bytes=8180 934322 (+0.0%) -compression_parallel_threads=4 -compression_type=snappy 1642519 (-0.2%) Pretty neutral change(s) overall. SST read performance test (related to #13583). Set up: ``` for COMP in none snappy zstd; do echo $ARGS; ./db_bench -db=/dev/shm/dbbench-$COMP --benchmarks=fillseq,flush -num=10000000 -compaction_style=2 -fifo_compaction_max_table_files_size_mb=1000 -fifo_compaction_allow_compaction=0 -disable_wal -write_buffer_size=12000000 -compression_type=$COMP; done ``` Test (compile with CLANG, run before & after simultaneously): ``` for COMP in none snappy zstd; do echo $COMP; (for I in `seq 1 5`; do ./db_bench -readonly -db=/dev/shm/dbbench-$COMP --benchmarks=readrandom -num=10000000 -duration=20 -threads=8 2>&1 | grep micros/op; done) | awk '{n++; sum += $5;} END { print int(sum / n); }'; done ``` Before (this PR and with #13583 reverted): none 1495646 snappy 1172443 zstd 706036 zstd (after constructing with -compression_max_dict_bytes=8180) 656182 After: none 1494981 (-0.0%) snappy 1171846 (-0.1%) zstd 696363 (-1.4%) zstd (after constructing with -compression_max_dict_bytes=8180) 667585 (+1.7%) Pretty neutral. Reviewed By: hx235 Differential Revision: D74626863 Pulled By: pdillinger fbshipit-source-id: dc8ff3178da9b4eaa7c16aa1bb910c872afaf14a

Summary: Adds new classes etc. in internal compression.h that are intended to become public APIs for supporting custom/pluggable compression. Some steps remain to allow for pluggable compression and to remove a lot of legacy code (e.g. now called `OLD_CompressData` and `OLD_UncompressData`), but this change refactors the key integration points of SST building and reading and compressed secondary cache over to the new APIs. Compared with the proposed facebook#7650, this fixes a number of issues including * Making a clean divide between public and internal APIs (currently just indicated with comments) * Enough generality that built-in compressions generally fit into the framework rather than needing special treatment * Avoid exposing obnoxious idioms like `compress_format_version` to the user. * Enough generality that a compressor mixing algorithms/strategies from other compressors is pretty well supported without an extra schema layer * Explicit thread-safety contracts (carefully considered) * Contract details around schema compatibility and extension with code changes (more detail in next PR) * Customizable "working areas" (e.g. for ZSTD "context") * Decompression into an arbitrary memory location (rather than involving the decompressor in memory allocation; should facilitate reducing number of objects in block cache) Pull Request resolved: facebook#13540 Test Plan: This is currently an internal refactor. More testing will come when the new API is migrated to the public API. A test in db_block_cache_test is updated to meaningfully cover a case (cache warming compression dictionary block) that was previously only covered in the crash test. SST write performance test, like facebook#13583. Compile with CLANG, run before & after simultaneously: ``` SUFFIX=`tty | sed 's|/|_|g'`; for ARGS in "-compression_parallel_threads=1 -compression_type=none" "-compression_parallel_threads=1 -compression_type=snappy" "-compression_parallel_threads=1 -compression_type=zstd" "-compression_parallel_threads=1 -compression_type=zstd -verify_compression=1" "-compression_parallel_threads=1 -compression_type=zstd -compression_max_dict_bytes=8180" "-compression_parallel_threads=4 -compression_type=snappy"; do echo $ARGS; (for I in `seq 1 20`; do ./db_bench -db=/dev/shm/dbbench$SUFFIX --benchmarks=fillseq -num=10000000 -compaction_style=2 -fifo_compaction_max_table_files_size_mb=1000 -fifo_compaction_allow_compaction=0 -disable_wal -write_buffer_size=12000000 $ARGS 2>&1 | grep micros/op; done) | awk '{n++; sum += $5;} END { print int(sum / n); }'; done ``` Before (this PR and with facebook#13583 reverted): -compression_parallel_threads=1 -compression_type=none 1908372 -compression_parallel_threads=1 -compression_type=snappy 1926093 -compression_parallel_threads=1 -compression_type=zstd 1208259 -compression_parallel_threads=1 -compression_type=zstd -verify_compression=1 997583 -compression_parallel_threads=1 -compression_type=zstd -compression_max_dict_bytes=8180 934246 -compression_parallel_threads=4 -compression_type=snappy 1644849 After: -compression_parallel_threads=1 -compression_type=none 1956054 (+2.5%) -compression_parallel_threads=1 -compression_type=snappy 1911433 (-0.8%) -compression_parallel_threads=1 -compression_type=zstd 1205668 (-0.3%) -compression_parallel_threads=1 -compression_type=zstd -verify_compression=1 999263 (+0.2%) -compression_parallel_threads=1 -compression_type=zstd -compression_max_dict_bytes=8180 934322 (+0.0%) -compression_parallel_threads=4 -compression_type=snappy 1642519 (-0.2%) Pretty neutral change(s) overall. SST read performance test (related to facebook#13583). Set up: ``` for COMP in none snappy zstd; do echo $ARGS; ./db_bench -db=/dev/shm/dbbench-$COMP --benchmarks=fillseq,flush -num=10000000 -compaction_style=2 -fifo_compaction_max_table_files_size_mb=1000 -fifo_compaction_allow_compaction=0 -disable_wal -write_buffer_size=12000000 -compression_type=$COMP; done ``` Test (compile with CLANG, run before & after simultaneously): ``` for COMP in none snappy zstd; do echo $COMP; (for I in `seq 1 5`; do ./db_bench -readonly -db=/dev/shm/dbbench-$COMP --benchmarks=readrandom -num=10000000 -duration=20 -threads=8 2>&1 | grep micros/op; done) | awk '{n++; sum += $5;} END { print int(sum / n); }'; done ``` Before (this PR and with facebook#13583 reverted): none 1495646 snappy 1172443 zstd 706036 zstd (after constructing with -compression_max_dict_bytes=8180) 656182 After: none 1494981 (-0.0%) snappy 1171846 (-0.1%) zstd 696363 (-1.4%) zstd (after constructing with -compression_max_dict_bytes=8180) 667585 (+1.7%) Pretty neutral. Reviewed By: hx235 Differential Revision: D74626863 Pulled By: pdillinger fbshipit-source-id: dc8ff3178da9b4eaa7c16aa1bb910c872afaf14a

facebook-github-bot added the CLA Signed label Nov 9, 2020

lucagiac81 mentioned this pull request Nov 9, 2020

Add support for compressor plugins #6717

Open

mrambacher reviewed Nov 9, 2020

View reviewed changes

lucagiac81 force-pushed the compressor_interface branch from bfcc7c6 to c81fd0e Compare November 17, 2020 16:50

lucagiac81 force-pushed the compressor_interface branch from c81fd0e to af7a4ea Compare November 18, 2020 17:53

lucagiac81 force-pushed the compressor_interface branch 2 times, most recently from 2982813 to 004b920 Compare March 31, 2022 01:27

lucagiac81 force-pushed the compressor_interface branch from 004b920 to 55ca263 Compare April 1, 2022 14:54

mrambacher reviewed Apr 2, 2022

View reviewed changes

lucagiac81 force-pushed the compressor_interface branch from 55ca263 to 7bd8334 Compare April 13, 2022 16:30

lucagiac81 force-pushed the compressor_interface branch from 7bd8334 to 871e767 Compare April 18, 2022 23:32

lucagiac81 force-pushed the compressor_interface branch from 871e767 to 8348e5b Compare April 29, 2022 19:58

lucagiac81 force-pushed the compressor_interface branch from 8348e5b to d239489 Compare May 16, 2022 23:05

lucagiac81 force-pushed the compressor_interface branch from d239489 to 558dc2e Compare May 24, 2022 03:49

lucagiac81 force-pushed the compressor_interface branch from 558dc2e to 44fe2ad Compare May 31, 2022 19:40

lucagiac81 force-pushed the compressor_interface branch from 44fe2ad to 5d6b3d7 Compare June 3, 2022 18:18

lucagiac81 force-pushed the compressor_interface branch from 5d6b3d7 to c45c959 Compare June 27, 2022 01:27

mrambacher reviewed Jun 28, 2022

View reviewed changes

lucagiac81 force-pushed the compressor_interface branch from c45c959 to 6927ed5 Compare July 26, 2022 17:50

lucagiac81 force-pushed the compressor_interface branch from 85818a2 to 8f4a3e6 Compare February 9, 2023 18:57

lucagiac81 force-pushed the compressor_interface branch from 8f4a3e6 to b738894 Compare April 27, 2023 14:33

lucagiac81 force-pushed the compressor_interface branch from b738894 to 6acf396 Compare May 1, 2023 18:03

lucagiac81 mentioned this pull request Jun 26, 2023

Add Compressor interface speedb-io/speedb#584

Open

lucagiac81 force-pushed the compressor_interface branch from 6acf396 to 7727d48 Compare October 5, 2023 20:12

lucagiac81 force-pushed the compressor_interface branch from 7727d48 to 374251f Compare October 6, 2023 00:22

lucagiac81 force-pushed the compressor_interface branch from 374251f to 89fc8cc Compare January 16, 2024 20:02

lucagiac81 force-pushed the compressor_interface branch from 89fc8cc to 72af189 Compare January 16, 2024 22:10

lucagiac81 force-pushed the compressor_interface branch from 72af189 to e267425 Compare January 16, 2024 23:04

pdillinger reviewed Jan 26, 2024

View reviewed changes

pdillinger reviewed Feb 5, 2024

View reviewed changes

lucagiac81 added 2 commits December 29, 2024 21:11

Add compressor interface

99e8691

Remove unnecessary const shared_ptr<Compressor>& parameters

494b2ac

lucagiac81 force-pushed the compressor_interface branch from e267425 to 494b2ac Compare January 8, 2025 00:58

lucagiac81 mentioned this pull request Apr 15, 2025

Big refactor for preliminary custom compression API #13540

Closed

Conversation

lucagiac81 commented Nov 9, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Compressor class

Built-in compressors

Compressor lifecycle

Uh oh!

mrambacher left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

lucagiac81 commented Nov 17, 2020

Uh oh!

pdillinger commented Mar 31, 2022

Uh oh!

lucagiac81 commented Mar 31, 2022

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

pdillinger commented Apr 4, 2022

Uh oh!

lucagiac81 commented Apr 18, 2022

Uh oh!

facebook-github-bot commented May 17, 2022

Uh oh!

facebook-github-bot commented May 24, 2022

Uh oh!

facebook-github-bot commented May 31, 2022

Uh oh!

facebook-github-bot commented Jun 3, 2022

Uh oh!

facebook-github-bot commented Jun 27, 2022

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

lucagiac81 commented Nov 9, 2020 •

edited

Loading